[1] 2.452172e+139
Edward Vytlacil
Review: Hypothesis Testing
Randomization Inference
Applying Randomization Inference to PROGRESA
Multiple hypothesis testing
Definition: Null and Alternative Hypotheses
Null Hypothesis \(\mathbb{H}_0\),
Alternative Hypothesis \(\mathbb{H}_1\)
Definition: Test Statistic
A test statistic \(T_n\) is a function of our sample that summarizes evidence against the null, with larger values indicating stronger evidence against the null.
Definition: p-value
The p-value is the probability under the null of observing a test statistic as large or larger as the one observed in the sample.
Definition: Hypothesis Test
A hypothesis test is a decision rule that specifies for which values of the test statistic \(T_n\) we accept or fail to accept the null hypothesis.
Definition: Type I & II Errors
Type I Error: Reject null when null is true
Type II Error: Fail to reject null when null is false
Definition: Significance Level of Test
A test has significance level \(\alpha\) if probability of Type I error when null hypothesis is true is less than or equal to \(\alpha\).
Definition: Significance Level of Test
A test has significance level \(\alpha\) if probability of Type I error when null hypothesis is true is less than or equal to \(\alpha\).
Theorem:
Suppose \(p_n\) is a p-value for the null hypothesis \(\mathbb{H}_0\). Then the test that rejects the null whenever \(p_n \le \alpha\) has significance level \(\alpha\).
Review: Hypothesis Testing
Randomization Inference
Applying Randomization Inference to PROGRESA
Multiple hypothesis testing
In this example,
| \(i\) | \(X_i\) | \(Y_i\) |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 0 | 0 |
| 4 | 1 | 1 |
| 5 | 0 | 1 |
\(\bar{Y}_1 -\bar{Y}_0 = \frac{1}{2}\), but is difference just by random chance?
For any inference procedure:
Formally state null and alternative hypotheses,
Choose a test statistic to measure evidence against the null hypothesis,
Calculate distribution of test statistic under the null to construct p-values, perform hypothesis test.
Definition: Weak Null
\(\mathbb{H}_0: \mathbb{E}[ Y_{1,i} - Y_{0,i}] =0\), vs
\(\mathbb{H}_1: \mathbb{E}[ Y_{1,i} - Y_{0,i}] \ne 0.\)
Null of no effect on average,
versus alternative of some effect on average.
Commonly tested with t-test, based on asymptotic normality.
Definition: Strong Null
\(\mathbb{H}_0: Y_{1,i} - Y_{0,i}=0\) for all \(i\), vs.
\(\mathbb{H}_1: Y_{1,i} - Y_{0,i} \ne 0\) for at least one \(i\).
Null of no effect on any subject,
versus alternative of some effect on someone.
Randomization inference will test strong null as we can find distribution of test statistic under sharp null exploiting known randomization procedure.
| \(i\) | \(Y_{0,i}\) | \(Y_{1,i}\) | \(X_i\) | \(Y_i\) |
|---|---|---|---|---|
| 1 | 1 | 1 | ||
| 2 | 1 | 1 | ||
| 3 | 0 | 0 | ||
| 4 | 1 | 1 | ||
| 5 | 0 | 1 |
If \(X_i=1\), then
\(Y_{1,i}=Y_{i}\)
We do not know \(Y_{0,i}\)
If \(X_i=0\), then
\(Y_{0,i}=Y_{i}\)
We do not know \(Y_{i}\)
| \(i\) | \(Y_{0,i}\) | \(Y_{1,i}\) | \(X_i\) | \(Y_i\) |
|---|---|---|---|---|
| 1 | ? | 1 | 1 | 1 |
| 2 | ? | 1 | 1 | 1 |
| 3 | 0 | ? | 0 | 0 |
| 4 | ? | 1 | 1 | 1 |
| 5 | 1 | ? | 0 | 1 |
However, under the sharp null, \(Y_{1,i}=Y_{0,i}\) for all \(i\), . . .
Under sharp null:
\(Y_i = Y_{1,i}=Y_{0,i}\) for all \(i\).
Under sharp null hypothesis, we can fill in missing elements of the table.
| \(i~~~~\) | \(Y_{0,i}\) | \(Y_{1,i}\) | \(X_i\) | \(Y_i\) |
|---|---|---|---|---|
| 1 | (1) | 1 | 1 | 1 |
| 2 | (1) | 1 | 1 | 1 |
| 3 | 0 | (0) | 0 | 0 |
| 4 | (1) | 1 | 1 | 1 |
| 5 | 1 | (1) | 0 | 1 |
Randomization inference: exploit randomization procedure and that we know both \(Y_{0,i}\) and \(Y_{1,i}\) under the null to find distribution of test statistic under the null, and thus to find p-values and test the null.
There are \({5 \choose 3}=10\) ways to assign treatment to \(3\) out of \(5\) subjects.
Under sharp null, we can calculate value of test statistic for each way to assign treatment.
Suppose randomization procedure puts equal probability on each possible way to assign treatment. Then we can find distribution of test statistic under sharp null.
| \(i\) | \(Y_i\) | \(X_1\) | \(X_2\) | \(X_3\) | \(X_4\) | \(X_5\) | \(X_6\) | \(X_7\) | \(X_8\) | \(X_9\) | \(X_{10}\) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| 2 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 |
| 3 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
| 4 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 |
| 5 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
| \(T_N=\) | 1/3 | 1/2 | 1/2 | 1/3 | 1/3 | 1/2 | 1/3 | 1/3 | 1/2 | 1/3 |
Under sharp null, we know value of \(T_N\) for each possible treatment assignment. Using that randomization procedure makes each possible treatment assignment equally likely, we thus know distribution of \(T_N\) under the sharp null.
| \(i\) | \(Y_i\) | \(X_1\) | \(X_2\) | \(X_3\) | \(X_4\) | \(X_5\) | \(X_6\) | \(X_7\) | \(X_8\) | \(X_9\) | \(X_{10}\) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| 2 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 |
| 3 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
| 4 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 |
| 5 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
| \(T_N=\) | 1/3 | 1/2 | 1/2 | 1/3 | 1/3 | 1/2 | 1/3 | 1/3 | 1/2 | 1/3 |
Under sharp null:
\(\Pr[T_N = 1/3] = 6/10\).
\(\Pr[T_N = 1/2] = 4/10\).
If \(T_N=1/3\), p-value\(=1\).
If \(T_N=1/2\), p-value\(=0.4\).
Review: Hypothesis Testing
Randomization Inference
Applying Randomization Inference to PROGRESA
Multiple hypothesis testing
Complication:
Complication:
Complication:
Randomization inference on PROGRESA Sample
term estimate two_tailed_p_value
1 Custom Test Statistic 0.03923947 6e-04
library(ri2, quietly = TRUE)
#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))
m.0<- length(unique(subset(dfPost,
dfPost$treat==1)$sooloca))
declaration_ra<-declare_ra(N=n.0,
m=m.0,clusters=dfPost$sooloca)
declaration_ra
# perform randomization inference
ri_out <- conduct_ri(
test_function = function(d){
abs(mean(d[d$treat==1,]$school)
-mean(d[d$treat==0,]$school))},
declaration = declaration_ra,
assignment = "treat",
outcome = "school",
data = dfPost,
sims = 10000
)
# showing results
plot(ri_out)
ri_outri2 to perform randomization inference ri2 documentation).library(ri2, quietly = TRUE)
#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))
m.0<- length(unique(subset(dfPost,
dfPost$treat==1)$sooloca))
declaration_ra<-declare_ra(N=n.0,
m=m.0,clusters=dfPost$sooloca)
declaration_ra
# perform randomization inference
ri_out <- conduct_ri(
test_function = function(d){
abs(mean(d[d$treat==1,]$school)
-mean(d[d$treat==0,]$school))},
declaration = declaration_ra,
assignment = "treat",
outcome = "school",
data = dfPost,
sims = 10000
)
# showing results
plot(ri_out)
ri_outdeclare_ra from package ri2 to declare that randomization was at village level, with each of the ways to assign treatment to m.0 out of n.0 villages equally likely.library(ri2, quietly = TRUE)
#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))
m.0<- length(unique(subset(dfPost,
dfPost$treat==1)$sooloca))
declaration_ra<-declare_ra(N=n.0,
m=m.0,clusters=dfPost$sooloca)
declaration_ra
# perform randomization inference
ri_out <- conduct_ri(
test_function = function(d){
abs(mean(d[d$treat==1,]$school)
-mean(d[d$treat==0,]$school))},
declaration = declaration_ra,
assignment = "treat",
outcome = "school",
data = dfPost,
sims = 10000
)
# showing results
plot(ri_out)
ri_outdeclare_ra from package ri2 to declare that randomization was at village level, with each of the ways to assign treatment to m.0 out of n.0 villages equally likely.library(ri2, quietly = TRUE)
#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))
m.0<- length(unique(subset(dfPost,
dfPost$treat==1)$sooloca))
declaration_ra<-declare_ra(N=n.0,
m=m.0,clusters=dfPost$sooloca)
declaration_ra
# perform randomization inference
ri_out <- conduct_ri(
test_function = function(d){
abs(mean(d[d$treat==1,]$school)
-mean(d[d$treat==0,]$school))},
declaration = declaration_ra,
assignment = "treat",
outcome = "school",
data = dfPost,
sims = 10000
)
# showing results
plot(ri_out)
ri_outRandom assignment procedure: Cluster random assignment
Number of units: 14996
Number of clusters: 491
Number of treatment arms: 2
The possible treatment categories are 0 and 1.
The number of possible random assignments is approximately infinite.
The probabilities of assignment are constant across units:
prob_0 prob_1
0.3727088 0.6272912
library(ri2, quietly = TRUE)
#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))
m.0<- length(unique(subset(dfPost,
dfPost$treat==1)$sooloca))
declaration_ra<-declare_ra(N=n.0,
m=m.0,clusters=dfPost$sooloca)
declaration_ra
# perform randomization inference
ri_out <- conduct_ri(
test_function = function(d){
abs(mean(d[d$treat==1,]$school)
-mean(d[d$treat==0,]$school))},
declaration = declaration_ra,
assignment = "treat",
outcome = "school",
data = dfPost,
sims = 10000
)
# showing results
plot(ri_out)
ri_outusing function conduct_ri from package ri2 to perform randomization inference.
test_function is defining test statistic, here \(T_N = |\bar{Y}_1-\bar{Y}_0|\).
library(ri2, quietly = TRUE)
#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))
m.0<- length(unique(subset(dfPost,
dfPost$treat==1)$sooloca))
declaration_ra<-declare_ra(N=n.0,
m=m.0,clusters=dfPost$sooloca)
declaration_ra
# perform randomization inference
ri_out <- conduct_ri(
test_function = function(d){
abs(mean(d[d$treat==1,]$school)
-mean(d[d$treat==0,]$school))},
declaration = declaration_ra,
assignment = "treat",
outcome = "school",
data = dfPost,
sims = 10000
)
# showing results
plot(ri_out)
ri_outdeclaration is defining randomization procedure, defined above with declare_ra function.library(ri2, quietly = TRUE)
#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))
m.0<- length(unique(subset(dfPost,
dfPost$treat==1)$sooloca))
declaration_ra<-declare_ra(N=n.0,
m=m.0,clusters=dfPost$sooloca)
declaration_ra
# perform randomization inference
ri_out <- conduct_ri(
test_function = function(d){
abs(mean(d[d$treat==1,]$school)
-mean(d[d$treat==0,]$school))},
declaration = declaration_ra,
assignment = "treat",
outcome = "school",
data = dfPost,
sims = 10000
)
# showing results
plot(ri_out)
ri_outassignment is defining treatment variable.
outcome is defining outcome variable.
sims is number of random draws of possible treatment assignments.
library(ri2, quietly = TRUE)
#declaring randomization procedure
n.0<- length(unique(dfPost$sooloca))
m.0<- length(unique(subset(dfPost,
dfPost$treat==1)$sooloca))
declaration_ra<-declare_ra(N=n.0,
m=m.0,clusters=dfPost$sooloca)
declaration_ra
# perform randomization inference
ri_out <- conduct_ri(
test_function = function(d){
abs(mean(d[d$treat==1,]$school)
-mean(d[d$treat==0,]$school))},
declaration = declaration_ra,
assignment = "treat",
outcome = "school",
data = dfPost,
sims = 10000
)
# showing results
plot(ri_out)
ri_out term estimate two_tailed_p_value
1 Custom Test Statistic 0.03923947 6e-04
Randomization inference on Sample of Boys
p-value: 0.0124
p-value \(< 0.05\), so reject at \(0.05\)-significance level.
Randomization inference on Sample of Girls
p-value: 0.0012
p-value \(< 0.05\), so reject at \(0.05\)-significance level.
| Grade | boys | girls |
| 1 | 0.349 | 0.609 |
| 2 | 0.006 | 0.926 |
| 3 | 0.702 | 0.337 |
| 4 | 0.0003 | 0.022 |
| 5 | 0.294 | 0.077 |
| 6 | 0.002 | 0.008 |
| 7 | 0.011 | 0.0003 |
| 8 | 0.564 | 0.634 |
| 9 | 0.285 | 0.681 |
| 10 | 0.264 | 0.667 |
Randomization inference conditional on sex and grade of child:
Showing p-values for sharp null of no effect.
How would you summarize the evidence of whether PROGRESA has an effect, and on whom?
Possible concern: data mining.
Review: Hypothesis Testing
Randomization Inference
Applying Randomization Inference to PROGRESA
Multiple hypothesis testing
Suppose our goal is to determine for which (if any) of the grades does the treatment have an effect, i.e., we want to test sharp null for each of the 10 grades.
Definition: FWER
Definition: FWER
Instead of testing so that the probability of falsely rejecting each null is at most \(\alpha\), we we can design our testing procedure so that the FWER is at most \(\alpha\).
Let \(\widehat p_j\) denote the p-value for testing \(\mathbb{H}_{0,j}\). Multiple hypothesis testing procedures typically use as inputs \((\widehat p_{1},...,\widehat p_{J})\), and “correct” the p-values for the multiplicity of hypotheses.
The simplest such procedure is the Bonferonni Procedure.
Definition: Bonferonni Procedure
Reject those \(\mathbb{H}_{0,j}\) for which \(J \times \widehat{p}_j \le \alpha\)
\(J \times \widehat p_j\) is p-value corrected for multiple hypothesis testing.
Can implement in R using function p.adjust with option method="bonferroni".
p.adjust(p, method="bonferroni") returns vector of p-values with Bonferroni correction.Theorem
The Bonferonni Procedure results in a FWER \(\le \alpha\).
Definition: Holm’s Procedure
Suppose there are \(J\) hypotheses.
Rank your p-values from smallest to largest. Let \(k\) index the ranks.
Start with \(k=1\).
Compute \(\frac{\alpha}{J+1-k}\). This is your critical value for rank \(k\).
If \(\widehat p_k\) is less than this critical value, reject hypothesis \(k\). Otherwise, do not reject hypothesis \(k\).
If hypothesis \(k\) was rejected, repeat step (3) for \(k+1\). If hypothesis \(k\) was not rejected, the process ends here, and all other hypotheses are not rejected.
p.adjust using option method="holm".
p.adjust(p, method="holm") returns vector of p-values with Holm’s correction.Theorem The Holm’s Procedure results in a FWER \(\le \alpha\), and rejects at least as many false null hypotheses as the Bonferroni procedure.
| Uncorrected | Bonferroni | Holm’s | |
| 1 | 0.349 | 1 | 1 |
| 2 | 0.006 | 0.062 | 0.050 |
| 3 | 0.702 | 1 | 1 |
| 4 | 0.0003 | 0.003 | 0.003 |
| 5 | 0.294 | 1 | 1 |
| 6 | 0.002 | 0.023 | 0.021 |
| 7 | 0.011 | 0.109 | 0.076 |
| 8 | 0.564 | 1 | 1 |
| 9 | 0.285 | 1 | 1 |
| 10 | 0.264 | 1 | 1 |
| Uncorrected | Bonferroni | Holm’s | |
| 1 | 0.609 | 1 | 1 |
| 2 | 0.926 | 1 | 1 |
| 3 | 0.337 | 1 | 1 |
| 4 | 0.022 | 0.225 | 0.180 |
| 5 | 0.077 | 0.768 | 0.538 |
| 6 | 0.008 | 0.077 | 0.069 |
| 7 | 0.0003 | 0.003 | 0.003 |
| 8 | 0.634 | 1 | 1 |
| 9 | 0.681 | 1 | 1 |
| 10 | 0.667 | 1 | 1 |
Econ 201